<html>
<head>
<title>Treating new classes and new discirminations encountered in data points descriptions
</title>
</head>

<body>
<h1>Treating new classes and new discirminations encountered in data points descriptions
</h1>

<p>When BOXER parses XML files containing descriptions of data points
(data set files), it needs to map Discrimination:Class labels
associated with data points in those files onto existing classes of
existing <a href="api/edu/dimacs/mms/boxer/Discrimination.html">discirminations</a> of the
current <a href="api/edu/dimacs/mms/boxer/Suite.html">Suite</a>. In an ideal situation, all
labels BOXER encounters in data set files would refer to
discirminations and classes that have already been explicitly created
by the BOXER application - either as a result of reading the suite
definition XML file, or by API calls. However, BOXER also has
provisions for dealing with labels that have <em>not</em> been
previously defined. The designer of a BOXER application can choose a
number of modes for dealing with them, ranging from saying, "All
labels must have been defined in advance, and any 'new' label is just
an error", to "I don't want to bother to define a suite in advance,
and want BOXER to assemble it based on the labels it finds". What is
actually done in any particular situation depends on what the parsed
data are meant to, the settings of the current suite, and the
properties of individual discirminations.


<p>There are two parsing modes: "definitional parsing" and
"nondefinitional parsing". The "definitional" mode can be used by a an
application when it wants BOXER to use the class labels contained in
the dataset files to create new discriminations and/or add classes to
existing ones. Such an application may want to use non-definitional
mode to parse, at least, its training sets before the DataPoint
objects are fed to the learner's absorbExample() method.

<p>The "nondefinitional parsing" mode is meant to process data sets
while makingh no, or only minimal, changes to the suite. It is
particularly suitable for parsing test examples, since one typically
may not want them to affect the suite used by the classifiers.

<p>The parsing process is outlined by the following table.

<p>
<table border=1>
<tr>
<th rowspan=2>Parsing mode
<th rowspan=2>New discrimination</th> 
<th colspan=4>New class in a discrimination with the class structure of...
<tr>
<th>DCS0 (Uncommitted)
<th>DCS1 (Fixed), or DCS2 (Bounded) with no anon classes left
<th>DCS2 (Bounded) with some anon classes available
<th>DCS3 (Unounded)
<tr>
<th>Definitional
<td>Create a DCS0 discrimination, and be ready to add new classes. The discrimination will be committed (to DCS1) later, the next time the classifiers is trained
<td>Add class
<td>Map to leftovers class if available; otherwise errror/ignore
<td>Appropriate an anon class
<td>Add class
<tr>
<th>Non-definitional
<td>Error/ignore
<td colspan=3>Map to leftovers class if available; otherwise errror/ignore
<td>Error/ignore (there are no leftovers classes in DCS3)
</table>

<p>
The above table makes no references to "committed/uncommitted" suite, because we presently (ver 0.6.003) make no such distinction.

<p>Explanations:
<ul>
<li>"Error/ignore" means that the action is controlled by an
appropriate flag (a property of the suite). "Error" means throwing an
exception, while "ignore" means behaving exactly as if the offending
label simply weren't there
<li>"Appropriate an anon class" means giving a name to one of the
still-anonymous classes of a DCS2 discrimination, using the newly
encountered label
<li>"Map to leftovers class" means behaving as if the label were the
name of the leftovers class of the discrimination. A discrimination
has a leftovers class only if, in its definition, one of its classes
has been dfined as such. The typical example is a discrimination
"Countries" with three classes, "USA", "Canada", and "Foreign", the
last one declared as the leftovers class. Any class label other than
"USA" or "Canada" will be converted to "Foreign" by the parser.
</ul>

<hr>
<p>
Back to <a href="boxer-user-guide.html">BOXER User Guide</a>


</body>

</html>

